202        Bioinformatics

As shown in Figure 5.27, the heatmap clusters genes and samples based on Euclidean dis-

tance between the expression values. As expected, samples from the same group are clus-

tered together.

5.3.7.9  Ontology and Pathways

After identifying the differentially expressed genes, the next step is to study the func-

tions of these genes, their pathways, and the conditions associated with them based on

the accumulated knowledge that already we have from previous studies and discoveries.

This knowledge is available in databases like GO [37] and KEGG [38] databases as well as

other pathway databases. Many of the genes associated with given biological processes are

differentially expressed in a given condition like diseases. By performing GO analysis, we

will be able to identify those biological processes, cellular locations, and molecular func-

tions that are impacted by the condition studied. GO attempts to capture three aspects of

the gene: (i) Biological processes (BP) that the gene may involve in, (ii) Molecular functions

(MF), and (iii) Cellular components (CC), where the biological processes and molecular

activities take place in the cell. It is important to know that GO terms aim to describe the

normal functions, processes, or locations that gene products are involved in. It does not

capture pathological processes, experimental conditions, or temporal information. Given a

set of differentially expressed genes (upregulated or downregulated genes), GO and KEGG

analyses will identify the GO terms and pathways, respectively, for each gene. EdgeR uses

“goana” function for GO analysis and “kegga” function for KEGG analysis. Both functions

require a DGELRT object and Entrez Gene identifiers (IDs) to annotate the genes. The

NCBI Entrez Gene IDs must be present as row name as we did above. Also, it is important

to specify the species studied. The following script performs GO analysis and annotates the

significantly expressed genes (downregulated and upregulated genes) with the GO terms:

FIGURE 5.28  Ontology annotation of the significantly expressed genes.